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We propose and study a version of simulated annealing (SA) on continu¬ 
ous state spaces based on {t, s)/j-sequences. The parameter i? G N regulates 
the degree of randomness of the input sequence, with the case R = 0 corres¬ 
ponding to IID uniform random numbers and the limiting case i? = oo to 
(t, s)-sequences. Our main result, obtained for rectangular domains, shows 
that the resulting optimization method, which we refer to as QMC-SA, con¬ 
verges almost surely to the global optimum of the objective function ip for 
any R € N. When ip is univariate, we are in addition able to show that the 
completely deterministic version of QMC-SA is convergent. A key property 
of these results is that they do not require objective-dependent conditions on 
the cooling schedule. As a corollary of our theoretical analysis, we provide a 
new almost sure convergence result for SA which shares this property under 
minimal assumptions on ip. We further explain how our results in fact apply 
to a broader class of optimization methods including for example threshold 
accepting, for which to our knowledge no convergence results currently ex¬ 
ist. We hnally illustrate the superiority of QMC-SA over SA algorithms in a 
numerical study. 

Keywords: Global optimization; Quasi-Monte Carlo; Randomized quasi- 
Monte Carlo; Simulated annealing; Threshold accepting 


1. Introduction 


Simulated annealing (SA) belongs to the class of stochastic optimization techniques, a 
popular suite of tools to hnd the global optimum of multi-modal and/or non-differentiable 
functions dehned on continuous sets (Geman and Hwang [1986 ). For instance, SA has 
been successfully used to solve complicated continuous optimization problems arising in 
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signal processing (Chen and Luk 1999), antenna array synthesis (Girard et al. 
and thermodynamics ([Zhang et ^ 2011); 


2001 ) 


see 


Locatelli (2002) and references therein for 


more applications of SA on continuous state spaces. In addition, since many inferential 
problems amount to optimizing an objective function, SA has proved to be useful both 


with frequentist (see, e.g., Goffe et al. 

1994 

Rubenthaler et al. 

2009 

Chen et al. 

2011 

and Bayesian (see, e.g.. 

Andrieu and Doucet 

2000; Ireland, 2007) statistical methods. 


In this paper, we propose and study SA algorithms based on quasi-Monte Carlo (QMC) 
point sets - informally, sets of points that more evenly cover the unit hypercube than 
uniform random numbers. QMC point sets are already widely used in numerical analysis 
to evaluate the integral of a function over a hypercube, and an important literature on 


QMC integration error rate has been developed during the last 30 years (Dick and Pil- 


lichshammer 20101. Since QMC point sets are designed to evenly cover hypercubes, they 


intuitively should be an efficient alternative to pseudo-random numbers inside stochastic 
optimization routines. However, their use for global optimization is surprisingly scarce 
and is mainly limited to quasi-random search and some ad-hoc improvements thereof; 
see Section [2.3.11 for a literature review. 

More precisely, we develop a SA algorithm (hereafter referred to as QMC-SA) using as 
input (t, s)ij-sequences. The parameter i? G N regulates the degree of randomness of the 
sequence, with the case R = 0 corresponding to IID uniform random numbers in [0,1)* 
and the limiting case R = oo to {t, s)-sequences, which encompass the most classical 
constructions of QMC sequences such as Sobol’, Fame and Niederreiter-Xing sequences 
(see, e.g, ||Dick and Pillichshammer 2010, Chapter 8). The case R = oo (i.e. deterministic 
QMC-SA) is of particular interest but our main result only holds for i? G N because, for 
multivariate objective functions, some randomness is needed to rule out odd behaviours 
that may be hard to exclude with deterministic point sets. Note that our convergence 
result is valid for any i? G N. Consequently, only arbitrary small randomness is needed 
and thus, as explained below (Section |3.1[ ), can be omitted in practice. For univariate 
test functions, our convergence result also applies for the limiting case R = oo. 

For “standard” (i.e. Monte Carlo) SA algorithms, the choice of the cooling schedule 
(Tn)n>i is a critical one since most convergence results for SA impose conditions on 
this sequence that are model-dependent and usually intractable. This is for instance the 
case for the almost sure convergence result of Belisle (19921 on compact spaces or the 
convergence in probability result of Andrieu et al. (2001) on unbounded domains. 

Regarding this point, the theoretical guarantees of QMC-SA, established on rectan¬ 
gular spaces, have the advantage to impose no problem-dependent conditions on the 
cooling schedule, only requiring one to choose a sequence (T„)„>i such that the series 
logn) is convergent. As a corollary of our analysis, we show that, under weaker 
assumptions than in Belisle (1992)’s result, selecting such a cooling schedule is enough 
to ensure the almost sure convergence of SA. We also note that our analysis applies for a 
broader class of optimization methods, which in particular encompass SA with arbitrary 


acceptance probability function and threshold accepting (Dueck and Scheuer, 1990 Mo- 


scato and Fontanari 1990). To the best of our knowledge, no convergence result exists 


for this latter method and thus this paper provides the hrst theoretical guarantees for 
this class of optimization techniques. 


2 



















































































From an algorithmic point of view, QMC-SA simply amounts to replacing the pseudo¬ 
random random numbers used as input to SA with points taken from a (t, s)ij-sequence. 
The cost of generating the first A^ > 1 points of a (t, s)H-sequence is 0(A^log A^) for any 
i? G N and therefore, for a fixed number of function evaluations, QMC-SA is slower than 
classical SA algorithms. However, the extra cost is small because in most cases we can 
generate points from (t, s)/j-sequences using efficient logical operations; further, the cost 
of simulating these sequences is typically minuscule in comparison to other algorithmic 
steps, such as evaluating the objective function. In addition, and as illustrated below, 
the deterministic version of QMC-SA typically requires many fewer function evaluations 
than SA to reach the same approximation error. 

The rest of the paper is organized as follows. In Section we set the notation and 
present the background material necessary to follow this work. The QMC-SA algorithm 
is presented in Section and its convergence study is given in Section In Section 
we discuss some extensions of QMC-SA; in particular, we will see how our convergence 
results apply to threshold accepting. Section proposes a numerical study to compare 
SA and QMC-SA, notably on a non-differentiable and high dimensional optimization 
problem that arises in spatial statistics. Section [^concludes. 


2. Preliminaries 


In this section we first set the notation we will use throughout this work before giving 
a brief introduction to SA and to QMC methods. In particular, the use of QMC point 


sets for global optimization is motivated and discussed in Section 2.3.1 


2.1. Notation 

Let A C be a measurable set and : A —)■ M. Then, we write ip* = supxg;(:’ <^(x) the 
quantity of interest in this work and V (A) the set of all probability measures on A which 
are absolutely continuous with respect to A(i(dx), the Lebesgue measure on Vectors 
in are denoted using bold notation and, for two integers a and 6, 6 > a, we write a : b 


, X 


N 


} in 


the set of integers {o,... ,5}. Similarly, x^''^ is the set of N points {x^, 
and, for a vector x G M'^, xi:i := (xi,... ,Xj), i G 1 : d. For two real numbers a and b, 
we will sometimes use the notation a V 6 (resp. a Ab) to denote the maximum (resp. the 
minimum) between a and b. 

Let K : A —)• 'P(A) be a Markov kernel whose density with respect to the Lebesgue 
measure is denoted by iL(y|x). We write iLj(x, dyQ, i G 1 : d, the distribution 

of Ui conditional on relative to dL(x, dy) (with the convention = 

iLi(x, dyi) when z = 1) and the corresponding density function is denoted by x) 

(again with the convention iLj(z/j|?/i:j_i,x) = Ki{yi\x) when i = 1). 

For a distribution vr G V{X), we denote by : X ^ [0,1]'^ (resp. ■ [0,1]'^ -A A) 
its Rosenblatt (resp. inverse Rosenblatt) transformation ( jRosenblatt 1952). When d = 

1, the (inverse) Rosenblatt transformation of vr reduces to its (inverse) CDF and, when 
d > 1, the z-th component of Tn- (resp. of F~^) is the CDF (resp. the inverse CDF) 
of component Xi conditionally on (xi,..., Xi_i). Similarly, for a kernel iL : A —?■ 'P(A), 
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we denote by Fk{x,-) (resp. •)) the Rosenblatt (resp. the inverse Rosenblatt) 

transformation of the probability measure -fC(x, dy) G V{X). We refer the reader e.g. to 
Gerber and Chopin ( 2015[ Section 3.1) for a more detailed presentation of these mappings. 
Finally, we write Bs{x) C X the ball of radius 5 > 0 around x G with respect to the 


supremum norm, i.e. Bs{i) = {x € X : 


X — X 


< 5} n X where, for vectors z G 


|z||oo — \^i\’ 


2.2. Introduction to simulated annealing 


As already mentioned, simulated annealing (see, e.g, Nikolaev and Jacobson 2010, for a 
recent overview ) is an iterative stochastic optimization techniques designed to evaluate 
the supremum of a function (p : X C —>■ M; see Algorithm below for a pseudo-code 
version of SA. At iteration n > 1 of SA, and given a current location x"""^ £ X, a 
candidate value y” is generated using the probability distribution iF(x"'“^,dy) G V{X), 
where K : X ^ V{X) is a Markov kernel. Then, a Metropolis step is performed to 
decide whether we move or not to this proposed location. In particular, if y” increases 
the function value, i.e. p{y^) > (^(x"“^), then we move with probability one to the 
location y”, i.e. x” = y” almost surely. However, some downwards moves are also 
accepted in order to escape quickly from local maxima. Given a current location x"'“^ 
and a candidate value y”, the probability to accept a downward move depends on T„, 
the level of temperature at iteration n. The sequence (r„)n>i is strictly positive and 
converges to zero as n —?• oo, allowing for more exploration early in the algorithm. 

There exist various convergence results for SA algorithms, the vast majority of them 
being based on the assumption that the state space is compact; see however Andrieu 


et al. (20011 for results on non-compact spaces. For compact spaces, convergence results 


can be found, e.g., in Belisle (19921; Locatelli (1996 20001; Haario and Saksman (1991) 


see also Lecchini-Visintini et al. (2010) for results on the finite time behaviour of SA on 


compact spaces. 


2.3. Introduction to quasi-Monte Carlo 

As mentioned in the introduction, QMG point sets are sets of points in [0,1)'^ that are 
more evenly distributed than IID uniform random numbers. There exist in the QMG 
literature many different measures of uniformity of a point set in [0,1)'^, the most 
popular of them being the star discrepancy, defined as 

N d 

D\u^--^)= sup I Vl(u”G [0,6)) -n^J, 

' n=l i=l ' 


where !(•) denotes the indicator function. A QMG (or low discrepancy) point set may be 
formally defined as a set of points in [0,1)*^ such that = 0{N~^{logN)^). 

Note that for a set of IID uniform random numbers, = 0(A^“^/^ loglog A^) 

almost surely (see, e.g., Niederreiter, 1992 p.l67). There exist various constructions 


of QMG point sets see for instance the books of 


Niederreiter 


(1992); 


Dick and 


Pillichshammer (2010) for more details on this topic. 
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2.3.1. Quasi-Monte Carlo optimization 


If QMC point sets are widely used to evaluate the integral of a function, there has been 
remarkably little use of QMC for global optimization, with efforts primarily limited to 
QMC versions of random search. We start by discussing this work in order to motivate 
why and how the good equidistribution properties of QMC point sets may be useful to 
improve stochastic optimization methods. 

Consider the problem of computing ip* for a function p : [0,1)^ —>■ M. Then, a simple 
way to approximate p* is to compute m{p,u^''^) := maxi<„<jv where is a 

set of N points in [0,1)“ 


Niederreiter 


(1983) shows that, for Lipschitz functions p, 


p* - m(p, < C(p}d(u^'-^) 

for a constant C(p) < oo which depends only on p and where 


= max min 
xe[0,l)‘^ l<n<N 


U — X 


is the dispersion of on [0, l)*^, which measures the maximum distance between a 
point in the state space and the points used in the search. Intuitively, the good partition 
of the points of a QMC point set inside the unit hypercube should translate into a small 
dispersion. And indeed, there exist several constructions of QMC point sets which verify 
for a constant C < oo, where the dependence in Ai > 1 is optimal 
( |Niederreiter 1992 Theorem 6.8, p.l54). 

Ad hoc improvements of quasi-random search, desig ned to improve upon the non - 
adaptive convergence rate, have been proposed by Niederreiter and Peart (1986), 


Hickernell and Yuan ( ]1997 ), Lei (2002), and Jiao et al. (2006), see also Fang (2002) and 


references therein. Finally, note that QMC optimization has been applied successfully in 
statistics for, e.g., maximum likelihood estimation and parameter estimation in nonlinear 


regression models (see Fang et al. 1996 

Fang 

2002 

and references therein) as well 

as for portfolio management ( 

Pistovcak and Brener 

2004 

) and power system tuning 


(Alabduljabbar et al., 2008). 


In optimization, we often run the algorithm until a given tolerance criterion is fulfilled, 
and thus the number of function evaluations is not known in advance. Therefore, we 
will focus in this paper on QMC point sets obtained by taking the first N points 
of a sequence (u”)„>i. Most constructions of QMC sequences belong to the class of the 


so-called (t, s)-sequences (Niederreiter 1987) which are, as shown below, well adapted 


for optimization and therefore constitute the key ingredient of QMC-SA. 


2.3.2. Definition and main properties of (t, s)-sequences 

For integers b >2 and s > 1, let 

S 

Tg = I [ajb~'^:>, {uj + 1)5“'^^) C [0,1)®, Oj, dj G N, aj < 5'^-’', j G 1 : s| 

i=i 

be the set of all 6-ary boxes (or elementary intervals in base 6) in [0,1)®. Then, we 
introduce the notion of {t, s)-sequences through two definitions: 
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Definition 1. Let m>0, b>2, 0<t<m and s > 1 be integers. Then, the set 
{U-Inlo' Of b^ points in [0,1)* is called a {t,m, s)-net in base b in every b-ary box 
E £ of volume b^~^ contains exactly 6* points of the point set 


Definition 2. Let b>2, t>0, s>l be integers. Then, the sequence (u”)„>o of points 
in [0,1)* is called a [t, s)-sequence in base b if, for any integers a > 0 and m>t, the set 
{'^'^}^n=ab^ ^ is a {t,m, s)-net in base b. 


An interesting property of (t, s)-sequences for optimization is that, for any > 1, 
the dispersion of the first N points of a {t, s)-seqnence is bonnded by for a 

constant < oo (Niederreiter 1992 Theorem 6.11, p.l56), where the dependence in 


N is optimal as recalled in the previous subsection. Over [t, m, s)-nets, t he dispersion is 
bounded by for a constant 


< Cb,t,s (Niederreiter 


1992 


Theorem 6.10, 

p.l56 ). Note that the integer b determines how often sets of points with good coverage of 
the hypercube arrive as we go through the sequence, while the parameter t > 0 measures 
the quality of these sets. In particular, for a given value of s and 6, the smaller t is the 
better the (t, m, s)-net spreads over the unit hypercube. However, for a given value of 
b >2 and s, (t, s)-sequences exist only for some values of t and, notably, a (0, s)-sequence 


(see, e.g., Dick and Pillichshammer 

2010 Corollary 4.36, p.141). We 

Niederreiter (1992| Chapter 4) and 

Dick and Pillichshammer (2010 


Chapter 4) for a more complete exposition of {t, s)-sequences. 


3. Quasi-Monte Carlo simulated annealing 

The QMC-SA algorithm we study in this work is presented in Algorithm Note that, 
when the input sequences are IID uniform random numbers in [0,1)'^"’“^, Algorithm]^ 
reduces to standard SA where we have written the simulations of the candidate value 
y” and the Metropolis steps as a function of d + 1 random numbers using the inverse 
Rosenblatt transformation of the Markov kernel. 

To obtain a QMC version of SA, Algorithm simply amounts to replacing the pseudo¬ 
random numbers used in classical simulated annealing algorithms by specific QMC se¬ 
quences, whose choice is crucial for the performance of the optimization routine. The 
SA algorithm proposed in this work is based on what we call (f, s)/j-sequences, that we 
introduce in the the next subsection. 

3.1. (t, s)ij-sequences: Definition and their use in QMC-SA 


Definition 3. Let b>2, t>0, s>l and R £ N be integers. Then, we say that the 
sequence (u^)„>o of points in [0,1)* is a [t, s) R-sequence in base b if, for alln > 0 (using 
the convention that empty sums are null), 

R 

uS = = + iei:s 

k=l 
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where the ’s are IID uniform random variables in [0,1) and where the digits a^- ’s in 
{0,..., 6 — 1} are sueh that (uJj 3 )„>o is a (t, s)-sequence in base b. 


The parameter R therefore regulates the degree of randomness of the sequence (u^)„>o, 
with the two extreme cases R = 0 and R = oo corresponding, respectively, to a sequence 
of IID uniform random variables and a completely deterministic sequence. For R > t, 
note that, for all a G N and for all m £ t : R, the set ^ is a {t, m, s)-net in 

base b. In addition, for any R < oo and n > 0, is uniformly distributed in a 6-ary 
box En G Sg of volume b~^^, whose position in [0,1)® depends on the deterministic part 
of u^. 


In practice, one will often use i? = 0 or the limiting case R = oo. But, to rule out 
some odd behaviours that cannot be excluded due to the deterministic nature of {t, s)- 
sequences, our general consistency result only applies for i? G N. As already mentioned, 
note that an arbitrary small degree of randomness suffices to guarantee the validity of 
QMC-SA since our consistency results holds for all ii G N. 

However, in practice, the randomization can be omitted. Indeed, SA algorithms are 
often run for a maximum of A^ < oo iterations and, if the result is not satisfactory, 
instead of increasing N, it is usually run again using a different starting value and/or a 
different input sequence. For most constructions of (t, s)-sequences (e.g. Sobol’, Fame 
and Niederreiter-Xing sequences), = 0 for all k > kn, where kn denotes the smallest 
integer such that n < b^". Thus, for a hxed N, if one chooses R > fc^r, the randomization 
is not necessary. 

Note also that (t, s)/j-sequences contain the practical versions of scrambled (t, s)- 
sequences (Owen, 19951. A scrambled (f, s)-sequence (u”)„>o is obtained from a {t,s)- 
sequence (u’^)„>o by randomly permuting (‘scrambling’) the digits a'ffs of its components 
in a way that preserves the equidistribution properties of (u"'),i>o. In particular, the ran¬ 
dom sequence (u”)„>o is a {t, s)-sequence in base b with probability one and, in addition, 
d” 


b-k 

such that ^ 0 for all k, i, s but, in practice, the inhnite sum is truncated and instead 


ZY([0,1)^) for all n > 0. The sequence (u”)„>o has component ft/ = F 


ki 


we use the sequence (u”)„>o, with component vf 


k=l 


di^b-^ ' 


+ ( 


^zf where, as 


above, the zf’s are IID uniform random variables in [0,1) (see Owen, 1995). 


3.2. Practical implementation 

Algorithm simply amounts to replacing the pseudo-random numbers used in clas¬ 
sical simulated annealing algorithms by points taken from a (0, l)-sequence and from a 
(t, s)/j-sequences (u^)„>o. In practice, the generation of (f, s)-sequences is an easy task 
because most statistical software contain routines to generate them (e.g. the package 
randtoolbox in R or the class qrandset in the statistical toolbox of Matlab). Gen¬ 
erating (u^)„>o for R G N+ is more complicated because one should act at the digit 
level. However, as a rough proxy, one can generate (u^)„>o as follows: generate a (t, d)- 
sequence (u"')„>o and set = u"" -|- b~^~^z'^, where the z"'’s are HD uniform random 
variables in [0,1)'’*. 

Concerning the computational cost, generating the hrst N points of most constructions 
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Algorithm 1 Generic QMC-SA algorithm to compute supxg;t^ 


Input: A starting point G A, a Markov kernel K acting from A to itself, Ti > 0, 
(u^)„>o, a (t, (i)ij-sequence in base b >2 and {v'^)n>o a (0, l)-sequence in base b. 
for n = 1 —)• A do 
Compute y" = 

if < exp{((/7(y’^) — <^(x"’“^))/r„} then 

— y«. 

else 


x" = x'^-i 


end if 

Select Tn+i G (0, T^] 

end for 


of (t, s)-sequences requires O(AlogA) operations (Hong and Hickernell 20031. This is 
slower than random sampling but the extra cost is particularly small when 6 = 2 since, 
in that case, bits operations can be used to generate the points of the sequence. 

Steps 2-7 of Algorithm IH sample x” from a distribution A(x"'“^,dx) G 7^(A) using 
the point (u^, v'^) G [0, Thus, although not required for our consistency results, it 

seems a good idea to choose the (0, l)-sequence {v^)n>o and the limiting {t, d)-sequence 
(u^)„>o such that the sequence (v”)„>o in [0, with component v” = 

is a {t,d + l)-sequence. Note that this is a weak requirement because most standard 
constructions of {t,d + l)-sequences ((uj^, r;”))„>o, such as, e.g., the Sobol’ or the Fame 
sequences (see Dick and Pillichshammer, 2010, Chapter 8, a definitions), are such that 
(u^)„>o and (u”)n>o have the right properties. 

Finally, compared to Monte Carlo SA, QMC-SA has the drawback of requiring Markov 
kernels K that can be sampled efficiently using the inverse Rosenblatt transformation 
approach. However, this is not an important practical limitation because SA algorithms 
are often based on (truncated) Gaussian or Cauchy kernels for which it is trivial to 


compute the inverse Rosenblatt transformation. Again, we refer the reader e.g. to Gerber 


and Chopin (|2015 Section 3.1) for a detailed presentation of this sampling strategy. 


4. Convergence results 

The convergence study of Algorithm is conducted under the assumption that A is 
a non-empty closed hyperrectangle of M'’*. Without loss of generality, we assume that 
A = [0,1]'’* in what follows. 

Our results require some (weak) regularity assumptions on the Markov kernel in order 
to preserve the good equidistribution properties of the QMC sequences. More precisely, 
we consider the following two assumptions on K : [0,1]'’* —)• 'P([0,1]'’*) : 

(Ai) For a fixed x G A, the i-th component of y) is strictly increasing in yi G [0,1], 

i G 1 : d; 
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(^ 2 ) The Markov kernel K{x,dy) admits a continuous density -fC(y|x) (with respect to 
the Lebesgue measure on [0,1]'^) such that, for a constant > 0, i^(y|x) > ^ for 
all (x, y) G 


Assumption |(Ai) ensures that, for any x G A", the inverse Rosenblatt transformation 
of F/^(x, dy) G V{X) is a well defined function while Assumption [(A^ is used, e.g., in 
Belisle| ( [l9i^ . 


In the next subsection we state some preliminary results on the dispersion of point sets 
obtained by transforming a {t, s)-sequence through inverse Rosenblatt transformations. 
These results provide key insight into how QMC may improve the performance of SA; 
readers mostly interested in the main results can however skip this subsection and go 


directly to Sections 4.2 and 4.3 


4.1. Preliminary results 


The following lemma provides conditions on the Markov kernel K : X ^ V{X) so that 
the inverse Rosenblatt transformation T^^(x, •) converts a low dispersion point set in 
[0,1)'^ into a low dispersion point set in X. 


Lemma 1. Let X = [0,1]'^ and (u”)„>o be a {t, d)-sequence in base b >2. Let K : X ^ 
V{X) be a Markov kernel such that Assumptions fAiJ (A 2 ) hold. Let (x,x') G X^ and 
y" = u""), n > 0. Then, there exists a 6 k > 0 such that, for any 6 G (0,5/^], 

there is a ks G N, ks > t X d, such that, for any a G N, the point set ^ 


contains at least 6* points in i? 5 (x'). In addition, 5 k o.nd ks do not depend on the point 
(x,x') G X^. Moreover, the result remains true i/y” = u”) with yX G R^^(5)(x) 

for all n > 0 and where vk '■ — >■ M"*" is independent of (x,x') G X'^, continuous 

and strictly increasing, and such that vk{6) —t 0 as (5 —)• 0. 


Proof. See Appendix [A| for a proof. □ 

As a corollary, note the following. Let A; > t, be a {t, k, (i)-net in base 

b >2 and define 


dx(Pb,k, K) = sup min ||F^^(x, u”) - x||oo, x G A” (1) 

x&X nSOift'' —1 

as the dispersion of the point set in X. Then, under the conditions of 

the previous lemma, dx{Pb,kj^j K) — t 0 as A: —)• 00 , uniformly on x G A”. 

Lemma provides an iterative random search interpretation of Algorithm Indeed, 
at location x” = x, this latter go through a low dispersion sequence in the state space X 
until a candidate value is accepted, and then the process starts again at this new location. 
The last part of the lemma shows that it is not a good idea to re-start the sequence each 
time we move to a new location because the point set {P^^(x”, may have good 

dispersion properties in X even when x” 7 ^ x” for some n,n' G 0 : ( 6 ^ — 1). 

It is also worth noting that Lemma gives an upper bound for the jumping time of 
the deterministic version of Algorithm that is, for the number rUn of candidate values 
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required to move from location x” = x to a new location x”'"''™" ^ x. Indeed, let x* G 
be a global maximizer of bn = ||x” — x*||oo and assume that there exists a (5 G (0, bn) 
such that (/? (xl > y?(x”) for all x G Bs{x*). Then, by the properties of (t, s)-sequences 
(see Section 2.3.2), the point set with ks as in Lemma contains at least 

one {t,ks,s)-net and thus, by this latter, there exists at least one m G 1 : 2ks such 
that = F^^(x, belongs to Bs{'x*). Since > <^(x), we indeed have 

m-n < 2ks < 2ks^. 

It is clear that the speed at which dx{Pb,k^%K) goes to zero as k increases depends 
on the smoothness of the transformation (x, •), x G Thus, it is sensible to choose 
K such that dx{Pb,k,^, K) goes to zero at the optimal non-adaptive convergence rate 
(see Section 2.3.1). The next lemma provides sufficient conditions for the kernel K to 
fulfil this requirement. 


Lemma 2. Consider the set-up of Lemma^ and, in addition, assume that, viewed as a 
funetion of (x, y) G TV(x, y) is Lipschitz with Lipschitz constant Ck < oo for the 
supremum norm. Let 


Ck = 0-5K{l A {0.25K/CKy), .^ = min{ inf Ki{yi\yi:i-i,x)}. 

i&l.d (x,y)eA’2 

Then, the result of Lemma^^holds for ks = [t + d — dlog(<5C'x/3)/log6] and for bx = 
(3/C'i^) A0.5. 

Proof. See Appendix for a proof. □ 

Let bk G (0, bx] be the size of the smallest ball around x' £ X that can be reached by the 
point set {Fx^ u^)}n=o ^ with k > ksj^, x£ X and where, as above, Pb^k = 
is a (t, A:,d)-net in base b > 2. Then, under the assumptions of Lemma b^ = 
for a constant C > 0 independent of x' and thus sup^g^y d;r’(P6^fc, x, iL) = 0{b~^^^) as 
required. 


4.2. General consistency results for QMC-SA 

To obtain a convergence result for Algorithm we need some regularity assumptions 
concerning the objective function (p. To this end, and borrowing an idea of|He and Owen 


(2015) (who study QMC integration of functions restricted to a non-roctangular set), we 
impose a condition on the Minkovski content of the level sets 


■= {x G A” : ip{x!) = <y9(x)}, X G A”. 

Definition 4. The measurable set A C X has a {d— 1)-dimensional Minkovski content if 
M{A) := lim^^o exists and is finite, where, for e > 0, we use the shorthand 

(A)e = {x G Al : 3x' G A, ||x — x'||oo < e}- 

The following lemma is the key result to establish the consistency of QMC-SA. 

Lemma 3. Consider Algorithm^where X = [0,1]'^ and assume the following conditions 
are verified: 
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1. The Markov kernel K : X ^ V{X) is such that Assumptions fAiJ (A 2 ) hold; 


2. ip is continuous on X and such that < 00 . 

Let i? G N. Then, if the sequence {(p{x^))n>i is almost surely convergent, with probability 
one (^(x”) ip* as n ^ 00 . 


Proof. See Appendix [C] for a proof. 


□ 


Note that the assumption on the Minkovski content of the level sets notably implies 
that (p has a finite number of modes. 

Thus, if in Algorithm only upward moves are accepted, then the resulting ascendant 
algorithm is convergent. To establish the consistency of QMC-SA, we therefore need 
to ensure that not too many “large” downward moves are accepted. This is done by 
controlling the rate at which the sequence of temperature (T„)n>i converges toward 
zero, as shown in the next results. We recall that kn denotes the smallest integer such 
that n < 6^". 

Lemma 4. Consider Algorithm^ with A C and where {v^)n>o is such that = 0. 
Then, at iteration n > 1, y” is rejected if ip{y'^) < ip{X^~^) — Tnknlogb. 

Proof To prove the lemma, note that for any k G N the point set is a (0, k, 1)- 

net in base b and therefore contains exactly one point in each elementary interval of 
length b~^. Hence, because = 0, the point set contains no point in the 

interval [0, b~^) and thus, for all n > 1, y” is rejected if 


exp ({(^(y-) - ip{-K--^)]/Tn) < b-^-. 


□ 

Lemma 5. Consider Algorithm^ with A C and assume that there exists a sequence 
{ln)n>i of positive numbers which verifies such that x”’ = x”“^ when 

(^(y”) < ip{x^~^) — Ijj^. Then, if cp is bounded, the sequence ((/9(x"'))„>o is almost surely 
convergent. 

Proof. Let ip = limsup^^g^ (/j(x"') and ip = liminf^^oo ¥^(x"'). Then, because ip is 
bounded, both ip and ip are finite. We now show that, under the conditions of the 
lemma, we have ip = ip with probability one. In what follows, although not explicitly 
mentioned to save space, all the computations should be understood as holding with 
probability one. 

Let (x®")s„>o and (x“")„„>o be two subsequences such that y?(x^”) —)• ip and ip{x^") —?■ 
ip. Let e > 0 and > 1 be such that |(/?(x^") — (p\ < 0.5e and \ip{yX^) — ip\ < 0.5e for all 
n > N,.. Then, for all n > we have 


00 00 

¥^(x”) > V^(x"^0 - >T- 0.5e - l-^ 


i=SNe 
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and, in particular, (p — 0.5e — Y^^=sm ^ ^ + 0.5e, \/un > max(sAr^, uatJ, 

implying that — e + X^Ssjv addition, the series i® convergent and 

thus — >• 0 as sat^ increases. Therefore, there exists a e > 0 and a A^e > 0 for 

which (p — (f < e + +i ~ showing that we indeed have (p = (p. □ 

Using Lemmas |3]j^ we deduce the following general consistency result for Algorithm 

Theorem 1. Consider Algorithm^ where X = [0,1]'^. Assume that the assumptions of 
Lemma^hold and that, in addition, (f”')n>o is such that = 0. Then, if {Tn)n>i is 
such that '^n log(n) < oo, we have, for any R € N and as n ^ oo, (p{X^) —t p>* 

almost surely. 

Proof. Let In = (T„A:„ log 5)“^ and note that, under the assumptions of the theorem, 
Y^=i^n^ < Therefore, by Lemmathe sequence {ln)n>o verifies the assumptions 
of Lemma Hence, because the continuous function (p is bounded on the compact space 
X, the sequence (<y9(x"'))n,>o is almost surely convergent by Lemmaand thus converges 
almost surely toward the global maximum ip* by Lemma □ 

As already mentioned, this result does not apply for R = oo because at this degree of 
generality we cannot rule out the possibility of some odd behaviours when completely 
deterministic sequences are used as input of QMC-SA. However, when d = 1, things 
become easier and we can show that the result of Theorem holds for the sequence 

(^oo)n>0 

Theorem 2. Consider Algorithm^ where X = [0,1] and R = oo. Assume that p>, K 
and {Tn)n>i verify the conditions of Theorem^ Then, p>{x^) (p* as n ^ oo. 

Proof. See Appendix [D] for a proof. □ 


Remark that, when d = 1, the condition on the Minkovski content of the level sets of 
if given in Lemma amount to assuming that, for all x G A such that p>{x) < (p* , there 
exists a 5x > such that ^p{y) / t{x) for all y G Bs^(x), y ^ x. 

A key feature of Theorems and is that the condition on the cooling schedules 
{Tn)n>i depends neither on ip nor on the choice of the Markov kernel K, while those 
necessary to ensure the convergence of standard Monte Carlo SA algorithms usually do. 


This is for instance the case for the almost sure convergence result of Belisle (1992 


Theorem 3); see the next subsection. An open question for future research is to establish 
if this property of QMC-SA also holds on non-compact spaces, where convergence of SA 
is ensured for a sequence (T„)n>i where r„ = To/log(n -|- C) and where both Tq > 0 


and C > 0 are model dependent (see Andrieu et al.[ 2001 Theorem 1). The simulation 


results presented below seem to support this view (see Section |6.2[ ) . 

Finally, note that the cooling schedule may be adaptive, i.e. Tn+i may depend on 
and that the convergence rate for {Tn)n>i implied by Theorem]^ is coherent with 


Belisle ( |1992 Theorem 2) which shows that almost sure convergence cannot hold if 
E“=iexp(-r-Q =oo. 
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4.3. The special case R = 0 and a new convergence result for SA 

It is worth noting that Theorem applies for R = 0] that is, when IID uniform random 
numbers are used to generate the candidate values at Step 2 of Algorithm Remark 
that, in this case, the use of the inverse Rosenblatt transformation to sample from the 
Markov kernel is not needed. In addition, it is easy to see from the proof of this result 
that the assumption on (p can be weakened considerably. In particular, the continuity of 
(p in the neighbourhood of one of its global maximizer is enough to ensure that, almost 
surely, —>• ^p* (see the next result). 

Since Theorem also applies when ii = 0, the key to remove the dependence of the 
cooling schedule to the problem at hand therefore comes from the sequence (f"')n> 0 ) 
used in the Metropolis step, which discards candidate values which are such that 
p{y'^) — is too small (see Lemma above). This observation allows us to propose 

a new consistency result for Monte Carlo SA algorithms on compact spaces; that is, when 
Step 2 of Algorithm is replaced by: 

2’: Generate y” ~ iL(x"'“^,dy) 

and when {v^)n>o is replaced by a sequence of IID uniform random numbers in [0,1). 

Theorem 3. Consider Algorithm with R = 0, X C a bounded measurable set, 
Step 2 replaced by Step 2’ and the sequence {v'^)n>o replaced by ('i)"')n>0; o, sequence of 
IID uniform random numbers in [0,1). Assume that the Markov kernel K : X ^ V{X) 
verifies Assumption rcl and that there exists a X* ^ X such that p(x*) = p* and such 
that p is continuous on Bs{x*) for a 6 > 0. Then, if (T„)n>i satisfies log(n) < 

oo, we have, as n ^ oo, p(x^) —)■ p* almost surely. 

Proof. Let a > 0 so that Pr('y”' < < oo. Thus, by Borel- 

Cantelli Lemma, with probability one D"' > for all n large enough. Therefore, for 

n large enough and using similar arguments as in the proof of Lemma y” is rejected 
with probability one if p{y^) < p{iC~^) —Tn{\ + a){\ogn) and hence, by Lemmaj^ with 
probability one there exists a G M such that p{x^) —>• p. To show that we almost surely 
have p = p*, let x* be as in the statement of the theorem. Then, using the fact that 
the Markov kernel verifies Assumption [(^ 2 )] it is easy to see that, with probability one, 
for any e G Q+ the set Re(x*) is visited infinitely many time by the sequence (y"")?!^!- 
Then, the result follows from the continuity of p around x*. □ 

The conditions on X and on p are the same as in the almost sure convergence result of 
, Theorem 3) but the condition on the Markov kernel is weaker. Finally, this 
requires that T„ < ^/{nen) for a model-dependent and strictly increasing 
sequence (en)n>i, while Theorem establishes the almost sure convergence of SA for a 
universal sequence of temperature. 


Belisle (1992 


latter result 


5. A general class of QMC-SA type algorithms 

We saw in Section 4.2 that, if only upward moves are accepted in Algorithm then the 
resulting ascendant algorithm is convergent. Thus, if one wishes to incorporate downward 
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Algorithm 2 Generic QMC-SA type algorithm to compute supxg;t^ 

Input: A starting point G A, a Markov kernel K acting from A to itself, /i > 0 and 
(u^)„>o, a (t, (i)ij-sequence in base b >2. 
for n = 1 —)• A do 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


RJ 


Compute y” = ^,u' 

if then 

— y-n 

else if < </?(x"'“^) — then 

x" = x'^-i 
else 

Set = x"'“^ or x” = y” according to some rule 

end if 

Select In+l G [ln,Oo) 

end for 


moves in the course of the algorithm, we need to control their size and their frequency. 
In SA algorithms, downward moves are introduced through the Metropolis step; suitable 
assumptions on the sequence of temperatures (Tn)n>i guarantee the convergence of the 
algorithm. 

Interestingly, our convergence results extend to Algorithm where a more general 
device is used to introduce downward moves. 


Corollary 1. Consider Algorithm^ where A = [0,1]'^ and assume that K and ip verify 
the assumptions of Theorem^ Then, if the sequence {ln)n>i is such that < o®; 

we have, for any i? G N and as n ^ oo, p{x^) —)• p* almost surely. In addition, if d = 1, 
p{x^) —>■ p* when R = oo. 


Proof The result for d > 1 is a direct consequence of Lemmas and while the result 
for d = 1 can be deduced from Lemma and from the proof of Theorem □ 

The fact that, under the assumptions on (v^)n>o and on (T„)n>i of Theorems and 
Algorithm indeed reduces to a particular case of Algorithm which verifies the 
assumptions of Corollary is a direct consequence of Lemma 

The modification of the exponential acceptance probability function to accelerate 
the convergence of the algorithm is a classical problem in the SA literature, see e.g. 
Rubenthaler et al. (2009) and references therein. Regarding this point, an important 


aspect of the connection between QMC-SA and Algorithm is that, in the Metropolis 
step of Algorithmic we can replace the exponential function by any strictly increasing 
function / : M —)• [0,1] which satisfies /(O) = 1 and limt_>._oo f(t) = 0, and the conver¬ 
gence results of Section |C remain valid under the simple condition that (Tn)n>i verifies 

{r„/-‘r'-'")) < oo. 

Another interesting case of Algorithm is the (Q MC version of) Threshold Accepting 
(TA) algorithms introduced by Dueck and Scheuer (19901; [Moscato and Fontanari (1990), 
where in Step 8 we always set x” = y"'. Like SA, TA has been introduced for optimization 
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problems on finite state spaces but has been successfully applied for continuous problems 
in various fields, for instance see Winker and Maringer (20071 and references therein for 
applications of TA in statistics and in economics. 

if convergence properties of SA are now well established, to the best of our knowledge 
the only convergence result for TA is the one of Althofer and Koschnick (19911, obtained 
for optimization problems on finite spaces. However, their proof is not constructive in 
the sense that they only prove the existence of a sequence of thresholds {ln^)n>i that 
provides convergence within a ball of size e around the global solution. To this regards. 
Corollary therefore constitutes the first convergence result for this class of algorithms. 


6. Numerical Study 

The objective of this section is to compare the performance of classical SA (i.e. Algorithm 
[phased on IID random numbers) with QMC-SA to find the global optimum of functions 
defined on continuous spaces. 

To compare SA and QMC-SA for a large number of different configurations, we first 
consider a bivariate toy example for which we perform an extensive simulation study 
(Section |6.1[ ). Then, SA and QMC-SA are compared on a difficult optimization problem 
arising in spatial statistics and which involves the minimization of a non-differentiable 
function defined on an unbounded space of dimension d = 102 (Section |6.2[ ). 

In all the simulations below the comparison between SA and QMC-SA is based on 1 000 
starting values sampled independently in the state space. The Monte Carlo algorithm is 
run only once while QMC-SA is implemented using a Sobol’ sequence as input (implying 
that 6 = 2 and R = oo). 


6.1. Example 1: Pedagogical example 


Following Robert and Casella (2004 


minimizing the function (pi : Xi := [— 


Example 5.9, p.l69) we consider the problem of 
1,1]^ —)■ M"*" defined by 


(^i(xi, X 2 ) = (xi sin(20x2) -|- X 2 sin(20xi))^ cosh ( sin(lOxi)xi) 
-|- (xi cos(10x2) — X 2 sin(lOxi))^ cosh (sin(20x2)x2). 


( 2 ) 


This function has a unique global minimum at (xi, X 2 ) = (0, 0) and several local minima 
on Xi] see Robert and Casella (2004 p.l61) for a grid representation of (pi. The com¬ 
parison between SA and QMC-SA for this optimization problem is based on the number 
of iterations that is needed to reach a ball of size 10“^ around the global minimum of (pi. 
To avoid infinite running time, the maximum number of iterations we allow is A^ = 2^^. 

The SA and QMC-SA algorithms are implemented for Markov kernels 

A:(-^)(x,dy) = /j^^^^^^j(?/i,xi,cr)/j^:(^^^^j(y2,a:2,o-)dy, j G 1 = 1,2, 


where, for j = 1 (resp. j = 2), denotes the density of the Cauchy (resp. 

Gaussian) distribution with location parameter p and scale parameter d > 0, truncated 
on / C M. Simulations are done for a G {0.01,0.1,1,10}. 
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We consider three different sequences of temperatures m G 1 : 3, defined 


by 


= Ti^\n^+nogn)-\ /n, = T^^\\ogn)-^ 


and where we choose e = 0.001 and 


E {20,200,2 000}, E {2,20,200}, E {0.02,0.2,2}. 


( 3 ) 


( 4 ) 


Note that {Tn^)n>i is such that results of Theorems lip hold while {Tn^ 

(TA"^ )n>i ) is the standard choice for SA based on Cauchy (resp. 
walks (see, e.g., Ingber 19891. However, on compact state spaces, SA based on these 


n>i (resp. 
Gaussian) random 


two kernels is such that the sequence ((^i(x"'))„>i converges in probability to the global 
minimum of (pi for any sequence of temperatures (T„)„>i such that —>■ 0 as n 


oo 


(see Belisle 1992 Theorem 1). 

Simulations are performed for different combinations of kernels and temperatures 
(jdm))^>i each of these combinations, simulations are done for all values of 
given in (Q and for all a E {0.01,0.01,1,10}. Altogether, we run simulations for 56 
different parametrisations of SA and QMC-SA. The results presented in this subsection 
are obtained from 1 000 starting values sampled independently and uniformly on Ai. 

Figure shows the results for the two kernels and for the sequences of temperatures 
given in (|^ where, for m E 1 : 3, Tg™^ is the median value given in (0). The results for 
the other values of Tg™'^ are presented in Appendix E (Figures [^and^). 

Focussing first on the results for the Cauchy kernel (first row), we observe that QMC- 
SA is never unambiguously worst than SA and is significantly better in most cases. 
The performance of QMC-SA becomes globally better as we increase the step size a 
(we however note that QMC-SA tends to be the less efficient when cr = 1) with the 
best results for QMC-SA obtained when o" = 10 where, in several settings, the maximum 
hitting time of an error of size 10“^ is around 100 (against 10^'® for SA). A last interesting 
observation we can make from this first set of simulation is that we obtain in several cases 
exactly the same hitting time for different starting values of the QMC-SA. 

The results for the Gaussian kernel show a slightly different story. Indeed, when 
cr = 0.01, QMC-SA provides a poorer performance than SA. The reason for this is that, 
as the tails of the Gaussian distribution are very small in this case, a large value of 
11 u)}, I loo is needed at iteration n of the algorithm to generate a candidate value y"" far 
away from the current location. However, the number of such points are limited when we 
go through a {t, d)-sequence and therefore QMC-SA explores the state space Ai mostly 
through very small moves when a kernel with very tiny tails is used. As a takeaway, when 
the proposal kernel has small variance alongside light tails, little benefit is obtained by 
QMC-SA. 

Altogether, we observe that SA unambiguously outperforms QMC-SA in only 8 of the 
56 scenarios under study (Figures |ldp^ with cr = 1, and, in Appendix]^ Figure 3c with 
cr = 1, Figure 4a with cr = 1 and Figures [4b||4d| with cr = 0.01). However, none of these 
situations correspond to a good (and hence desirable) parametrization of SA. 
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(d) 


(e) 


(f) 


Figure 1: Minimization of (p\ defined by (|^ for 1000 starting values sampled independ¬ 
ently and uniformly on X\. Results are presented for for the Cauchy kernel 
(top) and for the Gaussian kernel. For each kernel, simulation are done for 
where m = 1 (left plots), m = 2 (middle) and m = 3, with 
the median of the values given in Q. The plots show the minimum number 
of iterations needed for SA (white boxes) and QMC-SA to find a x G Ai such 
that (^i(x) < 10“^. For each starting value, the Monte Carlo algorithm is run 
only once and the QMC-SA algorithm is based on the Sobol’ sequence. 


17 
















































o=0.005 0=0.01 0=0.03 0=0.05 

(a) 


o=o!o 05 0=6.01 o=0.03 0=0.05 

(c) 

Figure 2: Minimization of (px for 1000 starting values sampled independently in X 2 (as 
explained in the text) and for A = 0.1 (top) and for A = 0.01 (bottom). Results 
are presented for a Cauchy random walk with step size as described in the text, 
and for {Tn^)n>i with = 5 000 (left) and for {T^'^)n>i with Tg^^ = 0.01 
(right). The plots show min{(^ 2 (x”'), n G 1 : 2^’^} obtained by SA (white boxes) 
and QMC-SA for each starting value. The results are obtained for di = 100 
locations. For each starting value the Monte Carlo algorithm is run only once 
and the QMC-SA algorithm is based on the Sobol’ sequence. 






10“ ■ 

o=o!o 05 0=6.01 o=0.03 o=0.05 


(b) 

10 ^- 

10“ 

S 10^ ■ 


1 

: 1 

1 : ^ 

nJ-L ^ 

.X 1 



O=0h05 0=6.01 O=0.03 0=0.05 


(d) 



6.2. Example 2: Application to spatial statistics 


Let {T(x) : x G be a spatial process and consider the problem of estimating the 
variogram E(|y(xj) — y(xj)p) for i,j G 1 : di. When the process is assumed to be 
stationary, this estimation is typically straightforward. However, for most real-world 
spatial problems arising in climatology, environmetrics, and elsewhere, inference is much 
more challenging as the underlying process {y(x) : x G is inherently nonstationary. 

A simple way to modeling nonstationary spatial processes is to use a dimension ex¬ 
pansion approach, as proposed by Bornn et al. (2012|. For the sake of simplicity, we 


assume that the process {y([x, 2 ;]) : [x, 2 ;] G M^} is stationary; that is, adding only one 
dimension is enough to get a stationary process. Thus, the variogram of this process 
depends only on the distance between location i and j and can be modelled, e.g., using 
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the parametric model 

[^ 2 ,Z 2 ]) = (/>l(l - exp{-||[xi,zi] - [X 2 , 2 : 2 ] 11/02}) 

where 0i and 02 are two positive parameters to be learned from the data. 

Assuming that we have M > 2 observations {ym,i}m=i location i G 1 : di, the 
solution to this problem is obtained by minimizing (px : X 2 := M"*" x M'*' x —)• M"*", 

defined by 


0a( 01) 02, z) — 


di 

E 

i<i<j 


(iJ) 




I)}Ea||z| 


(5) 


where A > 0 control the regularity of z and where m'=i Id™,* “ 2/m'jP 

is an estimate of the spatial dispersion between locations i and j. Note that (px is non- 
differentiable because of the Li-penalty and is both high-dimensional (with 1 parameter 
per observation) as well as nonlinear (due to way the latent locations factor into the 
parametric variogram). To further complicate matters, the objective function’s minimum 
is only unique up to rotation and scaling of the latent locations. 


Following Bornn et al. (20121, the observations are generated by simulating a Gaussian 


process, with di = 100 locations on a three dimensional half-ellipsoid centered at the 
origin and M = 1000. We minimize the objective function ([^ using both SA and 
QMC-SA with a Cauchy random walk defined by 


A'(x,dy) = 








where is as in the previous subsection. Simulations are performed for 


[cr 


( 1 ) 


= a X (O.l,O.l,O.50M(y-,i), ■ ■ •,O.50M(y-,di)), 


where dM{y-,i) denotes the standard deviation of the observations {ym,j}m=i loca¬ 
tion i G 1 : di. Simulations are conducted for a G {0.005,0.01,0.03,0.05} and for the 
sequence of temperatures {Tn'^)n>i given in ^ and for {Tn ^^)n>i defined by = 


Tq^^/ log(n-|-As already mentioned (see Section 4.2), the sequence {Tn'^)n>i 


is such that convergence results for SA on unbounded spaces exist but the constants 
and are model dependent and intractable. In this simulation study, we take 


Tq = 5 000, Tq = 0.1 and = 100. These values are chosen based on some pilot 
runs of SA and QMC-SA fo A = 0.1. Note also that the values of Tq^\ and are 


such that 
in this numerical study. 


T^\ where N = 2^^ is the number of function evaluations we consider 


Figures 2a 2b shows the values of min{(0;,(x"'), n G 0 : A^} obtained by SA and QMC- 


SA when for A = 0.1 and for 1000 different starting values xq = ((^o,i:2,'2o,i:(ii) sampled 
independently in X 2 as follows: 


~ ^(0, 2), (^ 0,2 ~ ^(0, 2), 2;o,i ~ ^(0,1), i G 1 : di. 
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Looking first at Figure |2b[ we observe that SA performs relatively well when the 
sequence {Tn^'^)n>i is used (together with a G {0.01,0.03}). The good performance of 
SA in this case is not surprising since and are calibrated such that SA works 
well when A = 0.1. However, we remark that, for this sequence of temperatures, QMC- 
SA outperforms SA for any value of a. The results obtained for the sequence {Tn^"^)n>i 
are presented in Figure If in this case SA performs poorly, and as for the sequence 
{Th^^)n>i, QMC-SA provides a small error for the vast majority of the 1000 different 
staring values (in particular when a G {0.005,0.01}). 

In practice, one often minimizes (f\ for different values of the A parameter, which 
determines the regularity of the optimal solution for z G It is therefore important 
that the optimization method at hand remains efficient for different values of A. To 
evaluate the sensitivity of QMC-SA and SA to this parameter. Figures 2c]|2d show the 
results obtained A = 0.01. For both sequences of temperatures, we observe that SA 
is much less effici ent than when A = 0.1. In particular, the results for the sequence 
{Tn^'^)n>i (Figure 2d) suggest that SA is very sensitive to the cooling schedule in this 
example. On the contrary, QMC-SA performs well in all cases and outperforms (from 
far) SA for the two sequences of temperatures and for all values of a we have chosen for 
this numerical study. 

The main message of this example is that the performance of QMC-SA on unboun¬ 
ded spaces is very robust to different choice of step-size a and of the cooling schedule. 
Consequently, tuning QMC-SA is much simpler than for SA algorithms. The results of 
this subsection seem to indicate that the convergence of QMC-SA on unbounded spaces 
could be obtained under the same condition for {Tn)n>i than for compact spaces (see 
Theorem]^. In particular, this suggests that there exists a universal sequence of cooling 
schedules which ensure the convergence of QMC-SA on non-compact space. However, 
further research in this direction is needed. 


7. Conclusion 

In this paper we show that the performance of simulated annealing algorithms can be 
greatly improved through derandomization. Indeed, in the extensive numerical study 
proposed in this work we never observe a situation where SA performs well while QMC- 
SA performs poorly. In addition, in the vast majority of the scenarios under study, 
QMC-SA turns out to be much better than plain Monte Carlo SA. This is particularly 
true in the high dimensional example proposed in this work where in most cases plain 
SA fails to provide a satisfactory solution to the optimization problem, while QMC-SA 
does. 

Our theoretical results also advance the current understanding of simulated anneal¬ 
ing and related algorithms, demonstrating almost sure convergence with no objective- 
dependent conditions on the cooling schedule. These results also hold for classical SA 
under minimal assumptions on the objective function. Further, the convergence res¬ 
ults extend beyond SA to a broader class of optimization algorithms including threshold 
accepting. 
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Future research should study QMC-SA on non compact spaces and extend QMC-SA 
to other types of Markov kernels. Concerning this hrst point, it is of great interest 
to check if the consistency of QMC-SA on unbounded spaces can, as it is the case for 
compact spaces, be guaranteed for a universal (i.e. not model dependent) sequence of 
temperatures. Our simulation results suggest that this is indeed the case and therefore 
should encourage research in this direction. 

In many practical optimization problems, and in particular in those arising in statistics, 
the objective function is multimodal but differentiable. In that case, stochastic gradient 
search algorithms (see, e.g., Gelfand and Mitter 1991 1993) are efficient alternatives to 


SA which, informally, correspond to SA based on a Langevin type Markov transitions. 
Intuitively, combining the information on the objective function provided by its gradient 
with the good equidistribution properties of QMC point sets should result in a powerful 
search algorithm. We leave this exciting question for future research. 
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A. Proof of Lemma [1] 


Let n G N, (x,x') G bx = 0.5 and b G {Q,bx]- Then, by Assumption (Ai) 
F^^(x;, uj*) G Bsi'x.') if and only if u” G FV(x, i?5(x')). We now show that, for b small 
enough, there exists a closed hypercube VF(x, x',J) C [0,1)'^ such that VF(x, x',J) C 
Fk{x, Bs{x')) for all x G i?^,^(5)(x), with VKi') as in the statement of the lemma. 

To see this note that, because K{x,dy) admits a density iL(y|x) which is continuous 
on the compact set A^, and using Assumption (A 2 ) it is easy to see that, for z G 1 : d, 
Aii( 2 /i|?/i:i-i,x) > K for all (x, y) G A^ and for a constant K > 0. Consequently, for any 
b G [0,0.5] and (x, y) G A^, 


F'Ki {xi + hjx, yi-i-i) - Fxi (x' - Jjx, yi-.i-i) > Kb, Vi G 1 : d 


( 6 ) 


where (-jx, yi-,i-i) denotes the CDF of the probability measure Ki(x, dz/Q, with 

the convention that FVj(-|x, = FV^(-|x) when i = 1. Note that the right-hand 

side of (© is Kb and not 2Kb to encompass the case where either x[ — b ^ [0,1] or 
x( -|- d 0 [0,1]. (Note also that because b < 0.5 we cannot have both x( — d 0 [0,1] and 
x[ + b^ [ 0 , 1 ].) 

For i G 1 : d and b' > 0, let 


uii{b') = sup \FKi (?/i|x, yi,i-i) - Fk, (y'|x', y'i,i_i) 

(x,y)eA’2,(x',y')eA’2 

||x-x'||ooV||y-y'||oo<i5' 


be the (optimal) modulus of continuity of (•[•). Since Fk^ is uniformly continuous 

on the compact set [0, l]'^"’"*, the mapping is continuous and 0 Ji{b') —)• 0 as d' —)• 0. 
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In addition, because Fk^ (-lx,yi;j_i) is strictly increasing on [0,1] for all (x,y) G 
uji{-) is strictly increasing on (0,1]. Let K be small enough so that, for i G 1 : d, 
^.2bK5x < Wi{l) and let dj(-) be the mapping z G (0,(5;i:’] i—)• 6i{z) = uj~^{0.25Kz). 
Remark that the function 6i{-) is independent of (x, x') G X^, continuous and strictly 
increasing on (Ojdu’] and such that 5i{5') —?• 0 as <5^ —>• 0. 

For X G X, 6' > 0 and d' > 0, i G 1 : d, let 

R 5 ,(x) = {x G [0,1]* : ||x - xuilloo < d'} n [0,1]* 


= {x G [0, 1 ]* : \xj - Xj\ < 5j, j Gl:i}n [0, 1 ]*. 

Then, for any d' > 0 and for all (x, G x i?l“J^(x'), we have 

\FKi (x' + d'|x, - Fxi (x' + d'|x, Xi,j_i) I < 0.25.Rd (7) 

WKi (x' - d'|x, yi-.i-i) - Fxi (x' - d'|x, xi,i_i) I < 0.25.Rd. (8) 

For 1 G 1 : d and d' G (0, d;^^], let di(d') = di(d') A d' and note that the function dj(-) 

is continuous and strictly increasing on (0,d;f]. Let d^ = 6^(6) and define recursively 

di = di(dj+i), i G 1 : (d — 1), so that d > d^ > • • • > di > 0. For i G 1 : d, let 

Xj(x, x', di:i) = sup Fk, (x' - di|x, 


and 


Xi(i,x',di:i) = inf Fk, (x'+ di|x,yni_i) 

(x,yi;i_i)eB5i(x)xB5^.._j(x') 


Then, since F/^.(-I-) is continuous and the set F5^(x) x F5^.-_,^(x') is compact, there exists 
points (x*,?/*,^_^) and (x*,y^.^_^) in ^^^(x) x F^j.._j(x') such that 


Li(x,x',di:i) = FK,ix[ - di|x*,y*^._^). 


ni(x,x',di:i) = Fk, (x' + 5i\yL\y\.^) . 


In addition, by the construction of the dj’s, F^i(x) x F 5 j,._j(x') C ^(x) x F| ^(x') 
for all z G 1 : d. Therefore, using (|^-([^, we have, for all z G 1 : d, 

z;i(k,x',di:i) - Xi(k,x',di;i) = Fi^^(x' + 5i\^\y\,i_i) - Fx^ix^ - 5i\x\y^^ ._^) 

> Fk, (x' + di|x, x[.i_^) - Fk, (x' - di|x, x'^.^) - 0.5Fdi 

> 0.5Fdj 

> 0 . 


Consequently, for all z G 1 : d and for all (x,G F5^(x) x F5^.._j(x'), 

Z;j(x,x',dl:i),Xi(x,x',dl:i) C Fx.ix'i - di|x, Z/i:j_i),Fft:.(x' +di|x, Z/i:j_i) 
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Let Sg = 0.5K5i. Then, this shows that there exists a closed hypercube lT(x, x', J) of 
side Sg such that 

IT(x,x',(5) C Fk{x,Bs^.^{x)) C Fk{x,Bs{x)), Vx E fi^^( 5 )(x) 

where we set vk{S) = (5i. Note that vk{5) E (0,5] and thus vk{S) —>■ 0 as 5 —)• 0, as 
required. In addition, vk(-) = 5i o ... 6d{-) is continuous and strictly increasing on (0, 6x] 
because the functions 5i(-), i E 1 : d, are continuous and strictly increasing on this set. 
Note also that VKi') does not depend on (x,x') E 
To conclude the proof, let 


ks = \t + d - dlog{Ss/3)/log b] (9) 

and note that, if 6 is small enough, ks > t + d because —>■ 0 as 5 —>■ 0. Let 5k be the 
largest value of 5' < 5x such that /cy > t + d. Let 5 E (0, 5k] and ^ t : {t + d) he 

such that {ks — ts^d)/d is an integer. Let {E{j,6)}^l^ be the partition of [0,1)'^ into 
elementary intervals of volume b^s,d-ks gQ that any closed hypercube of side 5 ^ contains 
at least one elementary interval E{j,6) for a j E 1 : Hence, there exists a 

jx,x' £ 1 : such that 

E{jk,x',,5) C lT(x,x',5) C Ek{^,Bs{x')), Vx E H„( 5 )(x). 


Let a E N and note that, by the properties of {t, s)-sequences in base b, the point set 
^ 1 is a {t,ks,d)-net in base b because ks > t. In addition, since ks > ts^d ^ 


t, the point set ^ ^ is also a {ts^d,ks,d)-net in base b (Niederreiter 


1992 


Remark 4.3, p.48). Thus, since for j E 1 : b^^ the elementary interval E{j,6) has 
volume the point set ^ therefore contains exactly b^^d > b^ points 

in E{jsf_^y^'^, 5) and the proof is complete. 


B. Proof of Lemma [2] 

Using the Lipschitz property of for all i E 1 : d, conditions 0 and Q in 

the proof of Lemma [l| hold with di{d) = 5{0.25K/Ck), i E 1 : d. Hence, we can 
take vk{5) = 5{h.2bK /CkY ^ ^ thus S_s = 50.5.R(1 A (0.25.R/C'i<')'^) . Then, the 
expression for ks follows using while the expression for 5k < 0.5 results from the 
condition ks '>t + d for all 5 E (0, 5k]- 


C. Proof of Lemma 


We first state and prove three technical lemmas: 


Lemma 6. Let X = [0,1]'^ and K : X ^ V{X) 
Assumptions (Ai) (A 2 )\ Then, for any 5 E (0,5ii-], 


be a Markov kernel whieh verifies 
with 5 k CIS in Lemma and any 
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(x,x') G , there exists a closed hypercube Vl^(x, x',(5) C [0,1)'^ of side Ss = 2.5K5, 
with K = maXjgi.rf{supx,ygA’-f^i(yil?/i:*-i)X)}, such that 

F/^(x,5^^(5)(x')) C T|"(x,x',(5), Vx G S^^(5)(x) (10) 

where vk{') is as in Lemma^ 

Proof. The proof of Lemma is similar to the proof of Lemma Below, we use the 
same notation as in this latter. 

Let 6 G (0,5/^], (x, x') G X^ and note that, for any (x,y) G X'^, 

PKiix'i + 5\x,yi-,i-i) - Fxiix'i - 5\x,yi-,i-i) < 2Kd, i € 1 : d. 

Let 0 < 5i < • • • < < (5 be as in the proof of Lemma and, for i € 1 : d, define 

Ui(x,x',5i,i) = inf Fx.ix'i-6i\x,yi.,i_i) 

(x, y)eB„^( 5 )(x),xB 5 j.._,^(x') 


( 11 ) 


and 


nj(x,x',5i:i) = sup Fk.(x'+ 5i|x,yi;i_i). 

(x, y)eB„^(^) (x') 


Let i G 1 : d and (x*,y*), (x*,y*) G i?^j^( 5 )(x) x i? 5 j,._j(x') be such that 

«i(x,x',di:i) = - (5i|x*,y*.._^), Ui{x,x',6) = Fk^(x'+ di|x\ 

Therefore, using 0, 0 and ( [lol ), we have, Vi G 1 : d, 

0 < Ui(x,x',di:i) - Ui{x,x,5i.,i) = Fx.ix'i + 6i\x\y\.i_-^) - TV,(x' - 

< FK,(x'i + dj|x,Xi,j_i) - FK,(x'i - dj|x,Xi,j_i) + 0.5.^di 

< di(2i? + 0.5iL) 

< 2.56iK 

where iV < .^ is as in the proof of Lemma (Note that Mi(x, x', di,*) — nj(x, x', di;*) 
is indeed strictly positive because TVj(-|x, yi;j_i,) is strictly increasing on [0,1] for any 
(x, y) G X^ and because dj > 0.) 

This shows that for all x G B.„^( 5 )(x) and for all y G we have 

[FKiix'i-5i\x,y),FKi{xi +6i\x,y)] C [nj(x, x, d), ni(x, x, d)], Vi G 1 : d 
and thus there exists a closed hypercube lT(x, x',d) of side Ss = 2.5dK such that 
FK{x,Bs^.^_,{x')) C lT(x,x',d), Vx G 5^^(5)(i). 

To conclude the proof of Lemma note that, because Uic(d) < dj for all i G 1 : d, 

Fic(x,5^^(5)(x')) C Fr'(x,B5j^,_j(x')). 

□ 
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Lemma 7. Consider the set-up of Lemma^ and, for (p, a, k) G Ni]., let 

Kk = G {a6^ ..., (a + 1)6^ - 1} : x" / < ^*} 

n |Vn G {ab ^,..., (a + 1)6^ — 1} : x” G 

Then, for all for all k £ N, there exists a G N such that ?!■( flasN-^a fc) ~ 

P>Pl 

Proof Let e > 0, a G N and / G M be such that I < ip*, and for A; G N, let E(k) = 

{E{j, k)}j^i be the splitting of X into closed hypercubes of volume k~^. 

Let p' G N+, 6 = 2~P' and C E{6) be the smallest coverage of (Xi)^ by hypercubes 
in E{S); that is, l-P^^I is the smallest value in 1 : S~'^ such that (Xi),, C . Let 

J^S — ^ such that j £ if and only if E{j,6) £ P^g. We now bound |J*^| 

following the same idea as in He and Owen ( 2015| ). 

By assumption, there exists a constant M < oo independent of I such that M{Xi) < M. 
Hence, for any fixed re > 1 there exists a e* G (0,1) (independent of 1) such that 
^d{{Xi)e) < wM{Xi)e < wMe for all e G (0,e*]. Let e = 2~p, with p G N such that 
2~P < 0.5e*, and take 6e = 2“^“^. Then, we have the inclusions {Xi)^ C Cy^^pi C {Xi) 2 e 
and therefore, since 2e < e*. 


u 


^ ^d{{Xl)2e) ^ 

- \d{E{j,5,)) - 


of 


( 12 ) 


where the right-hand side is independent of 1. 

Next, for j £ j\g , let yT be the center of E{j, and VL^(j, 6^) = U /gji W(x^,x^', 5^), 

with W{-, •, •) as in Lemma p Then, using this latter, a necessary condition to move at 
iteration n -|- 1 of Algorithm 1 from a point x” G E{jn,Se), with jn £ J^g , to a point 
x«-+i ^ x" such that x"'+^ G E{jn+i, S^) for a jn-{-i £ ^ W\jn, dff). 

Let k^^ be the largest integer such that (i) < Sg^b^, with Ss^ = 2.5K5e, K < oo, as 

in Lemmaand (ii) {k — f)/d\s a positive integer (if necessary reduce e to fulfil this last 

condition). Let E'{5f) = {E'{k,5e)'\^^^i ^ be the partition of [0,1)^ into hypercubes of 
volume . Then, for all j G W^{j,de) is covered by at most hypercubes 

of.B'(<5,). 

Let e be small enough so that k^^ > t X dR. Then, using the properties of (t, s)/j- 
sequences (see Section [3.1[ ), it is easily checked that, for all n > 0, 

Pr(u^ G E\k, 6,)) < VA: G 1 : (13) 

Thus, using (12 and the definition of k^^, we obtain, for all j £ n> 0, 


Pr(u^ G W^{j,d,)) < 2'^\jlgjb^b^-^' < C*5„ C* = 2‘^Cb^+\2.5Kfb' 


didR 
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Consequently, using the definition of e and and the fact that there exist at most 2'^ 
values of j G such that, for n G N, we have x"’ G we deduce that, for a 

p* G N large enough (i.e. for e = 2“^ small enough) 

Pr(^;f J = l)< b’^2^C*2-P-\ V(a, k) G N^, V/ < (^*, Vp > p* 

implying that, for p > p*, 

P^{Elk) ^ b^2'^C*2-P-\ V(a, k) G N^. 

Finally, because the uniform random numbers z^’s in [0,1)^ that enter into the definition 
of {t, s)i{-sequences are IID, this shows that 

Pv[n'^+^ E^j^) <{b’^2^C*2-P-^)'^, y{a,m,k) gN^, ^p > p*. 

To conclude the proof, for A; G N let G (0,1) and > p* be such that 

bk2^c*2-p-^ < Pk, yp>pl. 


Then, Pr( CagN =Q for all p > as required. 


□ 


Lemma 8. Consider the set-up of Lemma 3. For k G N, let E{dk) = {E{j,k)}^‘^^i be 
the partition of [0,1)'^ into hypercubes of volume 6“'^^. Let k^ G {dR + f) : {dR + t + d) 
be the smallest integer k such [k — t)/d is an integer and such that [k — t)/d> R and, 
for m G N, let L^ = ..., (m + 1)6^^ — 1}. Then, for any 6 G (0,d/^] verifying 

ks > f + d + dR (with 5 k cind ks as in Lemma^, there exists a p{5) > 0 such that 

Pr(3n G Im- G E{j, ks - ts^)) > p{5), Vj G 1 : Vm G N 


where ts,d G t : {t d) is such that {ks — ts^d)/d G N. 

Proof. Let m G N and note that, by the properties of (t, s)/j-sequence, the point set 
is a (t, k^,d)-net in base b. Thus, for all j G 1 : this point set contains 

6 * points in E{j,k^ — t) and, consequently, for all j G 1 : b’^^, it contains = 

jjk -dR y 2 points in E{j,dR). This implies that, for all j G 1 : b^^, the point set 
{'^r} n^l„i contains b^^ dR points in E{j, dR) where, for all n G Inn, uniformly 
distributed in E{jn, dR) for a jn G 1 : b^^. 

In addition, it is easily checked that each hypercube of the set E{dR) contains 

y^S-is,d-dR > ^ks-t-d-dR y ^ 


hypercubes of the set E{ks — ts,d), where ks and ts^d are as in the statement of the 
lemma. Note that the last inequality holds because <5 is chosen so that ks > t + d + dR. 
Consequently, 

Pr(3n Glm-. G ^(j, ks - ts,d)) > p{S) := > Q, Vj G 1 : 

and the proof is complete. □ 
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Proof of Lemma^ To prove the lemma we need to introduce some additional notation. 
Let O = [0,1)^, B{[0, 1)) be the Borel cr-algebra on [0,1), T" = B{[0, 1))®^ and P be the 
probability measure on defined by 

P(A) = H X,iA,), {Au B{[0, 1))®^. 

ieN 

Next, for a; G Q, we denote by (U^(a;))^^Q the sequence of points in [0, defined, for 
all n > 0, by (using the convention that empty sums are null), 

R 

U^(w) = (L'Rq(w), ■ ■ •, = ^+ h ^ujnd+i, 

k=l 

Note that, under P, (U^)^>g is a (t, d) ji-sequence in base b. Finally, for a; G fi, we denote 
by (x”)^^g the sequence of points in X generated by Algorithm when the sequence 
(US(‘^))n>0 is used as input. 

Under the assumptions of the lemma there exists a set Qi € J- such that P(rii) = 1 
and 

G M such that lim Vw G fii. 

Let cj G rii. Since ip is continuous, for any e > 0 there exists a £ N such 

that x” G {X^^)f: for all n > A^,e, where we recall that = {x € X : 3x' G 

X,^^ such that ||x — x'||oo < e}- In addition, because (p is continuous and X is compact, 
there exists an integer G N such that we have both lime_^o Pui,e = oo and 

^ ^ (I'l) 

Next, let X* G A" be such that ip{x*) = ip*■, G {dR + 1) : {dR +1 + d) be as in Lemma 
[^and, for (p, a, k) G N^, let 

= {a; G U : 3n G {a6^ ..., (a + l)b^ - 1} : / xfp(xf “i) < p*} 

n jw G n : Vn G {a6^,..., (a + 1)6^ - 1} : x^ G ■ 

Then, by Lemma there exists a p* G N such that P( Plagis^ ^ak^) ~ ^ P P P*, 

and thus the set Q 2 = f3p>p*[X \ verifies P(f^ 2 ) = 1- Let 1^2 = H 1^2 so 

that P(ri 2 ) = 1- 

For cj G ri 2 let > 0 be small enough so that, for any e G (0, e^,;], we can take Puj^e > P* 
in ( |14[ ). Then, for any cj G O 2 such that (p^ < ip*^ there exists a subsequence of 

(nr)m>i such that, for all f > 1, either 

x" = Vn G Im, ■■= {rriib^^ ,..., (m* + 1)6^^ - l} 

or 

3n G such that xf, 0 - 1 ) 
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Assume first that there exist infinitely many z G N such that (15) holds. Then, by 


(14), this leads to a contradiction with the fact that w G 112 C Therefore, for any 


u) £ 0.2 such that < tp* there exists a subsequence of {m)m>i such that, for 

a i* large enough. 


x,„ = x„ 


-1 


Vn G L 


niij 


Mi > 


(16) 


Let 0.2 = {u ^ 0.2 <Pui < p>*} ^ O 2 ■ Then, to conclude the proof, it remains to show 
that P(ll 2 ) = 0. We prove this result by contradiction and thus, from henceforth, we 
assume lP(li 2 ) > 0 . 

To this end, let x* G be such that (^(x*) = (/?*, x G and 6 G (0, 6k], with 6k as in 
Lemmaj^ Then, using this latter, a sufficient condition to have T^^(x, U^(ti;)) G Bs{x.*), 
n > 1, is that U^(a;) G iy(x,x*,5), with ]T(-,-,-) as in Lemma From the proof of 
this latter we know that the hypercube lF(x,x*,5) contains at least one hypercube of 
the set E{ks — ts^), where ts,d G t : {t + d) is such that {ks — ts^d)/d G N and, for A; G N, 
E(dk) is as in Lemma Hence, by this latter, for any 6 G (0,(5*], with <5* such that 
ks* > t + d + dR (where, for 6 > 0, ks is defined in Lemma Q, there exists a p{6) > 0 
such that 


G H : 3n G Im, (x, U^(a;)) G Bs{'x*)J > p{6), V(x, m) £ X x 
and thus, using (16), it is easily checked that, for any 6 G (0,(5*], 

P 2 (^a; G O 2 '■ (x”"^, U^(u;)) G Bsix*) for infinitly many n G =1 


N 


where P 2 denotes the restriction of P on O 2 (recall that we assume P(ll 2 ) > 0 ). 

For 6 >0, let 

0 '^ = |w G 112 : U^(a;)) G ^^(x*) for infinitly many n G n| 

and let p* G N be such that 2 ~ p * < 6 *. Then, the set H' = np>p*ll 2 _p verifies P2(ll0 = 1- 
To conclude the proof let a; G HL Then, because (p is continuous and (p^ < y?*, there 
exists a 6p^ > 0 so that ip{x) > (p for all x G B^_ (x*). Let 6p^ := > 6p^/\6k for an 

integer > p*. Next, take e small enough so that we have both Bs- (x*) n = 0 

and (y 9 (x) > (p{x') for all (x,x') G Bs^^{x*) x 

Using above computations, the set B^ (x*) is visited infinitely many time and thus 
(^(x2) > (pi^ for infinitely many re G N, contradicting the fact that </^(x”) —>■ (p^^ as 
re —)> 00 . Hence, the set H' is empty. On the other hand, as shown above, under the 
assumption P(ll 2 ) > 0 we have P2(ll0 = 1 and, consequently. O' 7 ^ 0 . Therefore, we 
must have P(ll 2 ) = 0 and the proof is complete. 


D. Proof of Theorem [2] 

Using Lemmas 1^ and we know that (/^(x”) —)> (^ G M and thus it remains to show that 

— 3k 

<p = (p . 
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Assume that (p ^ Lp* and, for e = 2“^, p G N+, let G N, Pe and > 0 be as in the 
proof of Lemma (with the dependence of A^e, p^ and of 5,^ on a; G fi suppressed in the 
notation for obvious reasons). 

Let X* G be a global maximizer of p and n = — 1 with G N such 

that n > Nf:. For /c G N, let E{k) = {E{j,k)}j^i be the splitting of [0,1] into closed 
hypercubes of volume k~^. Then, W Lemma a necessary condition to have a move at 
iteration n' + 1 > 1 of Algorithm from x^' G {Xp)e to x'^''^^ ^ x^\ G is 

that 

ut^W{e)-= U W{x\x^\el2) 


where, for j G 1 : (e/2) x^ denotes the center of E{j,el2), J ‘^^12 is as in the proof of 


Note that, using ( |12[ ) with d = 1, \ J‘^^i 2 \ — 


Lemmaj^and W{-, ■, ■) is as in Lemma 
for a constant C* < 00 (independent o: 

Let ' be the largest integer k > t such that b^~^ > ^e /2 as in Lemma 


and let e be small enough so that b^"" > 2'^C*h^. The point set 


1 '—1 


n>=a„b'^'“ 


IS 


a {t,k^'‘,d)-net in base b and thus the set LF(e) contains at most 2^C*h^ points of this 
points set. Hence, if for n > only moves inside the set (A’,^)e occur, then, for a 
n G : [{an + 1)5^'*'' — r?e — 1)), the point set is such that x^' = x” for all 

n G n : (n + p^), where rj^ > [ 23 ^^ 12 ^]5 that 00 as e —)• 0. 


Let /cg be the largest integer which verifies p^ > 2b^o so that contains at 

least one [t, /cg, d)-net in base b. Note that /cg —)> 00 as e —)• 0, and let e be small enough 
so that /sg > ks^, with ks as in Lemma Then, by Lemma there exists at least one 
n* G (h+ 1) : (n + p^) such that y^* := Fj/^(x”,n^) G Bs-{x*). Since, by the definition 
of for all (x,x') G Bs^{x*) x {Xp)e, and for e small enough, we have p{x) > <p(x'), it 
follows that p{y^ ) > p{x'^). Hence, there exists at least one n € h : (n + p^) such that 
x" 7 ^ x”, which contradicts the fact that x” = x for all n G h : (h + p^). This shows that 
(p is indeed the global maximum of p. 
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6.1 


E. Additional figures for the example of Section 



(c) (d) 


Figure 3: Minimization of (pi defined by (|^ for 1000 starting values sampled inde¬ 
pendently and uniformly on Xi. Results are presented for the Cauchy kernel 
{Kn'^)n>i with {Tn^^)n>i (top plots) and for For m = 1,2, simula¬ 

tions are done for the smallest (left plots) and the highest (right) value of 
given in (|^. The plots show the minimum number of iterations needed for SA 
(white boxes) and QMC-SA to find ax £ Xi such that <,5i(x) < 10“^. For each 
starting value, the Monte Carlo algorithm is run only once and the QMC-SA 
algorithm is based on the Sobol’ sequence. 
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(c) (d) 


Figure 4: Minimization of (p\ defined by Q for 1 000 starting values sampled independ¬ 
ently and uniformly on Results are presented for the Gaussian kernel 

(Ki^^)n>i with (top plots) and for (Ti^^)n>i- For m = {1,3}, sim¬ 

ulations are done for the smallest (left plots) and the highest (right) value of 
given in (Q. The plots show the minimum number of iterations needed 
for SA (white boxes) and QMC-SA to find a x G Ai such that <^i(x) < 10“®. 
For each starting value, the Monte Carlo algorithm is run only once and the 
QMC-SA algorithm is based on the Sobol’ sequence. 
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